---
title: "Alcohol Consumption Frequency Significantly Correlated with Vocabulary but not Cards Memory Task Performance in Adults on The Islands"
author: "Belal Hajjaj, Aaron Jin, Justin Yu, Brandon Luong, Modhar Al Qasser"
date: "2024-03-27"
geometry: margin = 0.5in
output: 
  pdf_document:
      fig_caption: true
      extra_dependencies: ["float"]
---

## Author Contributions

BH, AJ, JY, BL, and MA - Conceptualization and planning of research\
JY, AJ, BH - Design of methodologies\
MA, AJ, JY, BL, and BH - Data collection and recording\
JY - Annotation, management, and processing of data\
JY - Statistical analysis\
JY - Visualization and data presentation\
JY, BH, AJ, MA, and BL - Drafting report\
JY, BH, AJ, MA, and BL - Review and revision of final report

## Introduction

Alcohol is a socially accepted and frequently consumed drug by many people around the world, despite its harmful health effects (Nutt et al., 2021). Alcohol is a known causal factor for more than 200 diseases and is implicated in 5.3% of yearly global deaths (\~3 million individuals yearly). It has also been attributed to \~132 million disability-adjusted life years (DALYs), due to the burden of disease and injury caused by its overconsumption (Park & Kim, 2020). In addition to the physical damage alcohol can cause, there are also many prominent effects on brain function (Nutt et al., 2021). The adverse effects of maternal alcohol use during pregnancy on brain development in children have been extensively studied in the context of fetal alcohol syndrome (Nutt et al., 2021). Changes in brain structure and cognition in adults with alcoholism have also been widely observed in CT and MRI scans (Nutt et al., 2021).

Despite scientific interest in the neurological effects of alcohol, its effects on memory is a newly emerging topic with little to no conclusive study results (Gough et al., 2021; Butterworth et al., 2023; Compo et al., 2017). Many conflicting results have been observed, and meta-analyses of the current data demonstrate a need for more evidence (Brennan et al., 2020; Topiwala et al., 2017). Some studies show a positive impact of moderate alcohol consumption on cognitive function (Brennan et al., 2020). However, other studies suggest that even light drinking can lead to accelerated cognitive decline (Topiwala et al., 2017).

This cross-sectional observational study aims to address the question, “Is there a correlation between different levels of regular alcohol consumption and performance in memory and recall tasks?” By analyzing data from individuals with a wide range of alcohol consumption frequencies, we hope to determine whether there is an overall positive or negative correlation between alcohol consumption and memory ability. We hypothesize that alcohol consumption levels will correlate with a change in performance on memory tasks. By analyzing the relationship between alcohol consumption and memory, we aim to uncover valuable insights in various fields such as psychology, neuroscience, public health, and public policy (Butterworth et al., 2023; Compo et al., 2017).

## Materials and Methods

Participants were sampled from the village of Arcadia on the island of Providence. Arcadia was chosen for its population size of 4339, which is larger than other villages. Sampling from a larger population allows the study to represent a greater range of individuals, improving generalizability. Since there was no feasible method of obtaining a list of all individuals to take a simple random sample, a multistage sampling strategy was employed. Random houses were selected from all houses in Arcadia, a random participant was selected from each house. Only adults at least 19 years of age were included, to ensure participants were old enough to drink.

Each house in Arcadia has a number, so Excel `=RANDBETWEEN(1, 1571)` was used to generate a list of 200 house numbers out of the 1571 total houses in Arcadia. From each selected house, the adults were numbered in order of appearance. R `sample(number of adults, 1)` was used to generate a random number out of the number of potential participants. The corresponding individual was then enrolled into the study if consent was given. Selected individuals who did not give consent were excluded from the study. Numbers corresponding to an empty house or to a duplicate individual were also removed. This resulted in a final sample size of n = 163 participants. Using a probability sampling method with random number generation allows for the minimization of potential researcher-driven selection biases which could have arose if non-probability (e.g. convenience) sampling strategies were used.

An issue with this usage of Excel random number generation is reproducibility. A new random sequence of numbers would be generated if the study was repeated, and it would be difficult to replicate and validate the results. A potential solution would be to use only R with a specific seed for random number generation. Setting a seed allows the same sequence of numbers to be generated each time. For example, 200 numbers between 1 to 1571 could be generated as house numbers using `sample(1571, 200, replace = TRUE)`. Then, since there is a variable number of adults in each household, a different range of numbers would have to be generated each time. Because there is a seed set, a single `sample()` such as `sample(3, 1)` would always return the same number. So, a list of 200 numbers between 1 and 2, 1 and 3, 1 and 4, etc. could all be generated beforehand. These lists of numbers would then be merged into a dataframe and exported as a csv for use during data collection. For each house number, the randomly generated participant number from the corresponding column would be used based on the number of adults which happen to be in that house. An example of how this may be implemented is shown below.

```{r}
# set seed for reproducibility
set.seed(2024)

# randomly sample with replacement to generate random integers in a range
houses <- sample(1571, 200, replace = T)
two <- sample(2, 200, replace = T)
three <- sample(3, 200, replace = T)
four <- sample(4, 200, replace = T)

# combine into dataframe
randoms <- data.frame(houses = houses, two = two, three = three, four = four)

# export as csv for use in Excel during data collection
write.csv(randoms, "datarand.csv")
```

After sampling was completed, all participants were first asked the question of “How often do you drink alcohol?”. The responses of each participant could consistently be grouped within one of the ordinal categories "Rarely/Once or twice a year", "Once or twice each season", "Several times each season", "Once each day", or "Couple of times each day”. Participants who did not provide consent, empty houses, or duplicates were marked with the corresponding entry of "NO CONSENT", "EMPTY", "DUPLICATE", or "NONE" for alcohol consumption. The participants were then asked to take two types of memory tests: a cards test and a vocabulary test. Each participant’s results were scored out of 10 and 20 respectively. Participants were also asked “How forgetful do you feel right now?” to categorize the subjective forgetfulness of each participant. Responses fell into the ordinal categories "Not at all", "A little", and "Moderately".

```{r, fig0, fig.align='center', out.width = "600px", echo = FALSE}
knitr::include_graphics("design.pdf")
```

```{r, fig1, out.width = "350px", fig.align='center', echo = FALSE, fig.cap = "Sampling and measurement processes of the study."}
knitr::include_graphics("forgetfulness.png")
```

Graphical and numerical summaries of cards test scores, vocabulary test scores, and forgetfulness by alcohol consumption were first generated to look for any overall trends and potential associations. To verify if any alcohol consumption groups significantly differed from one another, one way-ANOVA F tests and Kruskal-Wallis tests were used to detect potential differences in mean memory test scores. If a significant difference was detected in these analyses, pairwise t-tests were then used to determine which groups had significantly different means. A 5% significance level was used since there was no clear need to set a different alpha to minimize a specific type of error. To check for association between the categorical groups of alcohol consumption and self-reported forgetfulness, chi-squared tests and Monte Carlo simulation chi-squared tests were used.

## Results

```{r setup, include = FALSE}
library(tidyverse)
library(kableExtra)
```

```{r include = FALSE}
# import data

raw <- read_csv("dat_new.csv")
```

```{r include = F}
# standardize data formatting

data <- raw %>%
  # keep only entries with data
  filter(Alcohol_consumption != "DUPLICATE" & Alcohol_consumption != "EMPTY" & Alcohol_consumption != "NO CONSENT" & Alcohol_consumption != "NONE") %>% 
  # clean up formatting
  mutate(Name = str_to_title(Name), # change all names to title case
        
        # standardize alcohol consumption format
        Alcohol_consumption = case_when(
          str_detect(Alcohol_consumption, fixed("several times each season", ignore_case = T)) ~ "Several times each season",
          str_detect(Alcohol_consumption, fixed("once or twice each season", ignore_case = T)) ~ "Once or twice each season",
          str_detect(Alcohol_consumption, fixed("once or twice a year", ignore_case = T)) ~ "Once or twice a year",
          str_detect(Alcohol_consumption, fixed("rarely", ignore_case = T)) ~ "Once or twice a year",
          str_detect(Alcohol_consumption, fixed("couple of times each day", ignore_case = T)) ~ "Couple of times each day",
          str_detect(Alcohol_consumption, fixed("coupleof times each day", ignore_case = T)) ~ "Couple of times each day",
          str_detect(Alcohol_consumption, fixed("drink each day", ignore_case = T)) ~ "Once each day"
        ),
        
        # standardize forgetfulness format
        Forgetful = case_when(
          str_detect(Forgetful, fixed("not at all", ignore_case = T)) ~ "Not at all",
          str_detect(Forgetful, fixed("a little", ignore_case = T)) ~ "A little",
          str_detect(Forgetful, fixed("moderately", ignore_case = T)) ~ "Moderately",
          str_detect(Forgetful, fixed("moderatly", ignore_case = T)) ~ "Moderately" # typo in one entry
        )
  )

# order by increasing alcohol consumption
data$Alcohol_consumption <- factor(data$Alcohol_consumption,
                                   levels = c("Once or twice a year",
                                              "Once or twice each season",
                                              "Several times each season",
                                              "Once each day",
                                              "Couple of times each day")
                                  )
```

To make comparisons between the quantitative cards memory test scores of different alcohol consumption categories, side-by-side boxplots were used (Figure 2). Means were also added to the plots as an additional factor for comparison. This graphical summary allows for the preliminary visualization of potential associations between alcohol consumption frequency and cards memory test scores. Numerical summaries were then calculated to more precisely examine the centers and sample sizes of the groups (Table 1).

```{r fig2, fig.cap = "Side-by-side boxplots of Cards Memory Test Score by Alcohol Consumption Frequency. Means shown in red.", message=FALSE, warning=FALSE}
data %>% 
  # create boxplots
  ggplot(aes(x = Alcohol_consumption, y = Cards)) +
  geom_boxplot() +
  # add means
  stat_summary(fun.y = mean, geom = "point", shape = 18, size=3, color="red", fill="red") +
  theme_light() +
  theme(axis.text.x = element_text(angle = 10, vjust = 0.5)) +
  labs(x = "Alcohol consumption frequency", y = "Cards Memory Test Score (/10)")
```

```{r tab1}
df <- data %>%
  group_by(Alcohol_consumption) %>% 
  
  # calculate medians and means
  summarise("Number of participants" = n(),
            "Median Cards Score (/10)" = median(Cards),
            "Mean Cards Score (/10)" = mean(Cards)
  )

kable(df,
      caption = "Numerical summaries for Card Memory Test Score by Alcohol Consumption Frequency") %>%
  kable_styling()
```

From Figure 2, there appears to be a moderate negative correlation between alcohol consumption frequency and performance on cards memory tests. The median scores appear to generally trend downwards with increasing consumption frequency, but the differences are small, ranging only between 7 and 9. The means behave similarly, with a slightly more clear trend. The *Couple of times each day* group does not appear to follow the trend, since it has both a higher median and mean compared to groups of less frequent alcohol consumption. The spread of data varies significantly between groups, potentially due to differences in sample sizes. The boxplot for the *Once each day* group appears especially unusual due to the very small sample size (n = 5) and presence of outliers. The numerical summary in Table 1 supports the presence of a negative association, as both the median and mean cards scores generally appear to decrease with greater consumption. However, it can also be seen from Table 1 that a large majority of participants are in the *Once or twice each season* or *Several times each season* categories. Very few participants were in any of the other categories (n $\leq$ 10), making these results less reliable.

Side-by-side boxplots (Figure 3) and numerical summaries (Table 2) were also used in a similar manner for the vocabulary memory tests.

```{r fig3, fig.cap = "Side-by-side boxplots of Vocabulary Memory Test Score by Alcohol Consumption Frequency. Means shown in red."}

data %>% 
  # create boxplots
  ggplot(aes(x = Alcohol_consumption, y = Vocab)) +
  geom_boxplot() +
  # add means
  stat_summary(fun.y = mean, geom = "point", shape = 18, size=3, color="red", fill="red") +
  theme_light() +
  theme(axis.text.x = element_text(angle = 10, vjust = 0.5)) +
  labs(x = "Alcohol consumption frequency", y = "Vocabulary Memory Test Score (/20)")
```

```{r tab2}
df <- data %>%
  group_by(Alcohol_consumption) %>% 
  
  # calculate medians and means
  summarise("Number of participants" = n(),
            "Median Vocab Score (/10)" = median(Vocab),
            "Mean Vocab Score (/10)" = mean(Vocab)
  )

kable(df,
caption = "Numerical summaries for Vocabulary Memory Test Score by Alcohol Consumption Frequency") %>%
  kable_styling()
```

Similarly to the cards memory tests, the boxplots for the vocabulary memory test scores in Figure 3 seem to show a negative correlation with increasing alcohol consumption. It looks like there may be a stronger trend in this test, as the medians are more significantly different from each other, ranging from 13 to 18.5. The scoring scheme of this test is out of 20 rather than 10, which may contribute to better sensitivity. The means follow a similar trend as the medians, but the means for the higher consumption groups are very close to each other. Interestingly, the *Couple of times each day* group does follow the trend for these scores, having the lowest median and mean. Once again, differences in the spread of data are observed between groups, possibly because of varying sample sizes. Table 2 supports the presence of a negative association between alcohol consumption frequency and vocabulary memory scores as well, with the medians and means generally lower for greater consumption.

Self-reported forgetfulness and alcohol consumption frequency are both categorical variables. A two-way table was first produced to examine the data (Table 3). Due to the varying sample sizes, many categories, and presence of cells with no observations, a mosaic plot would not be suitable for this data. So, in order to see whether forgetfulness varies with alcohol consumption, a stacked bar chart of the conditional distributions of forgetfulness by alcohol consumption was created (Figure 4).

```{r tab3}
# reorder forgetful levels
forgetfuldat <- data %>%
  mutate(Forgetful = factor(Forgetful, levels = c("Not at all", "A little", "Moderately")))

# create two-way table
forgetfultab <- table(forgetfuldat$Alcohol_consumption, forgetfuldat$Forgetful)

kable(forgetfultab,
      caption = "Two-way table of Forgetfulness by Alcohol Consumption Frequency") %>%
  kable_styling() %>% 
  # add labels
  add_header_above(c("Alcohol Consumption" = 1, "Forgetfulness" = 3))
```

```{r fig4, fig.cap = "Conditional distributions of Forgetfulness by Alcohol Consumption Frequency"}
data %>%   
  # order levels
  mutate(Forgetful = factor(Forgetful, levels = c("Moderately", "A little", "Not at all"))) %>% 
  ggplot(aes(x = Alcohol_consumption, fill = Forgetful)) +
  # use position = "fill" for conditional distributions
  geom_bar(position = "fill") + 
  theme_light() +
  labs(x = "Alcohol consumption frequency", y = "Proportion") +
  theme(axis.text.x = element_text(angle = 10, vjust = 0.5))
```

There does not appear to be a substantial association from the conditional distribution bar chart in Figure 4. The *Once each day* group has a much greater proportion of moderate forgetfulness, but this may be due to chance or sampling error. Table 3 shows that there were only 5 total responses which reported drinking *Once each day*, and of those, only 3 reported *Moderate* forgetfulness. From Figure 4, the proportion of *Moderate* responses actually decreases with increasing alcohol consumption over several other groups. However, the lowest alcohol consumption group, *Once or twice a year*, does have the highest proportion of *Not at all* forgetful responses and no *Moderate* forgetfulness responses. Additionally, the two highest consumption groups have the lowest proportions of *Not at all* forgetfulness responses.

\newpage
Although an association between memory game scores and alcohol consumption frequency appears possible from the graphical and numerical summaries, these results might be caused by chance, and this correlation may not truly be present in the population. To determine the extent to which these data actually provide evidence of an association, one-way ANOVAs were used to test for significant differences between mean cards and vocabulary memory test scores across alcohol consumption groups. A 5% significance level was used. If no significant differences are detected, the data does not provide evidence that there is a real correlation between alcohol consumption and memory scores. If a significant difference is detected between one or more alcohol consumption groups, the data does provide evidence to support a correlation between alcohol consumption and memory scores. Further tests could then be used to determine which groups are significantly different from each other.

Due to the multistage sampling method employed, there may not be an equal chance of each individual being selected due to differences in the number of residents in each house. However, there should be no dependency between individuals since the sample randomly selected both houses and residents. Since this was an observational study and participants were not assigned to groups beforehand, the groups should be independent from each other. To check if the population $\sigma$s might be the same for the groups, the sample standard deviations (Tables 4-5) and strip plots of residuals (Figures 5-6) were examined for both memory tests.
\newpage
```{r tab4}
# calculate n and stdev for vocab
sd_cards <- data %>% 
  group_by(Alcohol_consumption) %>% 
  summarise(n = n(), sd = sd(Cards))

kable(sd_cards,
      caption = "Standard deviations of Cards Memory Scores by Alcohol Consumption Frequency") %>%
  kable_styling()
```

```{r tab5}
# calculate n and stdev for vocab
sd_vocab <- data %>% 
  group_by(Alcohol_consumption) %>% 
  summarise(n = n(), sd = sd(Vocab))

# display via kable
kable(sd_vocab,
      caption = "Standard deviations of Vocab Memory Scores by Alcohol Consumption Frequency") %>%
  kable_styling()
```
```{r fig5, fig.cap = "Strip plot of residuals for Cards memory test."}
# run ANOVA first to get residuals
anova_cards <- aov(data$Cards ~ data$Alcohol_consumption)

# create dataframe for strip plot
strip <- data.frame(Alcohol_consumption = data$Alcohol_consumption, Residuals = anova_cards$residuals)

# create strip plot
strip %>% 
  # color code
  ggplot(aes(Alcohol_consumption, Residuals, colour = Alcohol_consumption)) +
  # hide color legend, ensure 0 vertical jitter, reduce horizontal jitter
  geom_jitter(show.legend = F, height = 0, width = 0.2) +
  theme_light() +
  # rotate labels
  theme(axis.text.x = element_text(angle = 10, vjust = 0.5)) +
  labs(x = "Alcohol consumption frequency")
```

```{r fig6, fig.cap = "Strip plot of residuals for Vocabulary memory test."}
# run ANOVA first to get residuals
anova_vocab <- aov(data$Vocab ~ data$Alcohol_consumption)

# create dataframe for strip plot
strip <- data.frame(Alcohol_consumption = data$Alcohol_consumption, Residuals = anova_vocab$residuals)

# create strip plot
strip %>% 
  # color code
  ggplot(aes(Alcohol_consumption, Residuals, colour = Alcohol_consumption)) +
  # hide color legend, ensure 0 vertical jitter, reduce horizontal jitter
  geom_jitter(show.legend = F, height = 0, width = 0.2) +
  theme_light() +
  # rotate labels
  theme(axis.text.x = element_text(angle = 10, vjust = 0.5)) +
  labs(x = "Alcohol consumption frequency")
```

\newpage

From Tables 4-5 and Figures 5-6, the groups look like they may belong to populations with different standard deviations. As a result, it may not be appropriate to assume the observed ANOVA F statistic will follow an F distribution, and the ANOVA p-value may not be reliable.

To check for normality in the data, normal Q-Q plots and histograms of the residuals for each memory test were used (Figure 7-10).

```{r fig7, fig.cap = "Normal Q-Q Plot of residuals for Cards memory test."}
qqnorm(anova_cards$residuals, main = NULL)
qqline(anova_cards$residuals)
```

```{r fig8, fig.cap = "Histogram of residuals for Cards memory test."}
hist(anova_cards$residuals, breaks = 10, main = NULL, xlab = "Residuals")
```

\newpage

```{r fig9, fig.cap = "Normal Q-Q Plot of residuals for Vocabulary memory test.", out.width = "75%", out.height = "75%", fig.align = 'center'}
qqnorm(anova_vocab$residuals, main = NULL)
qqline(anova_vocab$residuals)
```

```{r fig10, fig.cap = "Histogram of residuals for Vocabulary memory test", out.width = "80%", out.height = "80%", fig.align = 'center'}
hist(anova_vocab$residuals, breaks = 10, main = NULL, xlab = "Residuals")
```

\newpage

From Figures 7-10, the data for both tests appears to be significantly left-skewed and not normally distributed. This pronounced left skew suggests memory test scores may be left-skewed in the population. This is another factor contributing to the potential unreliability of a one-way ANOVA test on this data.

Since the normality and equal variance conditions for the one-way ANOVAs were violated, a non-parametric test may be a better alternative. The Kruskal-Wallis rank sum test is a non-parametric analogue to one-way ANOVA which does not assume normality and equal variance, and only requires independence (Kruskal & Wallis, 1952).

```{r}
# ANOVA F tests
anova_cards <- aov(data$Cards ~ data$Alcohol_consumption)
anova_cards_sum <- summary(anova_cards)

anova_vocab <- aov(data$Vocab ~ data$Alcohol_consumption)
anova_vocab_sum <- summary(anova_vocab)
```

```{r include = F}
# extract test statistic from anova outputs
statistics <- c(anova_cards_sum[[1]][["F value"]][1], anova_vocab_sum[[1]][["F value"]][1])

# extract p-value from anova outputs
ps <- c(anova_cards_sum[[1]][["Pr(>F)"]][1], anova_vocab_sum[[1]][["Pr(>F)"]][1])

# create vector of test labels
tests <- c("Cards one-way ANOVA", "Vocab one-way ANOVA")

# create table
anova_results <- data.frame(tests, statistics, ps)
names(anova_results) <- c("Test Type", "F statistic", "P value")
```

```{r}
# display results - process shown in full code
kable(anova_results, caption = "One-way ANOVA F test results") %>% kable_styling()
```

```{r}
# kruskal tests
krus_cards <- kruskal.test(Cards ~ Alcohol_consumption, data = data)
krus_vocab <- kruskal.test(Vocab ~ Alcohol_consumption, data = data)
```

```{r include = F}

# extract test statistic from kruskal
statistics <- c(krus_cards$statistic, krus_vocab$statistic)

# extract p-value from kruskal
ps <- c(krus_cards$p.value, krus_vocab$p.value)

# create vector of test labels
tests <- c("Cards Kruskal-Wallis", "Vocab Kruskal-Wallis")

# create table
krus_results <- data.frame(tests, statistics, ps)
names(krus_results) <- c("Test Type", "Kruskal-Wallis chi-squared", "P-value")
```

```{r}
# display results - process shown in full code
kable(krus_results, caption = "Kruskal-Wallis rank sum test results") %>% kable_styling()
```

Since a significant difference was observed for the mean vocabulary memory test scores in the one-way ANOVA and Kruskal-Wallis tests (p = 0.012 and p = 0.007 respectively), a pairwise t-test using the Bonferroni correction was used to examine which groups have significantly different mean scores.

\
\

```{r}
# run pairwise t test
# use as.data.frame to store p-values in data frame for presentation
pairwise <- as.data.frame(pairwise.t.test(data$Vocab, data$Alcohol_consumption, p.adj = "bonf")$p.value)

# replace NAs with - for clarity
pairwise[is.na(pairwise)] <- "-"

# display results
kable(pairwise,
      caption = "Pairwise comparisons for Vocab scores by Alcohol
      Consumption using t tests and Bonferroni correction") %>% kable_styling()
```

Considering the two categorical variables of alcohol consumption and self-reported forgetfulness, a Chi-squared test would be appropriate to test whether the two variables appear to be independent or dependent. A small p-value from this test provides evidence that the two variables are dependent, suggesting that there is some correlation between them.

The Chi-squared test requires independent observations, which was established when evaluating the one-way ANOVA conditions above. The counts can successfully be organized into a two-way table, as seen in Table 3. Also from Table 3, it is apparent that there are several cells which have less than 5 observations and even some cells with 0 observations. So, the Chi-squared test results may not be accurate.

Since the sample size conditions of the Chi-squared test were not met, a Monte Carlo simulation Chi-squared test may be more appropriate. This test uses randomization, and does not require \>5 observations in each cell (Hope, 1968).

```{r chisq, message=FALSE, warning=FALSE}
# normal chi-squared
chis <- chisq.test(data$Alcohol_consumption, data$Forgetful)

# monte carlo
# set seed for reproducibility
set.seed(2024)
monte <- chisq.test(data$Alcohol_consumption, data$Forgetful, sim = T)
```

```{r include = F}
# extract test statistics
statistics <- c(chis$statistic, monte$statistic)

# extract p-values
ps <- c(chis$p.value, monte$p.value)

# create vector of test labels
tests <- c("Chi-squared", "Monte Carlo Simulation")

# create table
chis_results <- data.frame(tests, statistics, ps)
names(chis_results) <- c("Test Type", "Chi-squared statistic", "P-value")
```

```{r}
# display results - process shown in full code
kable(chis_results, caption = "Self-reported Forgetfulness Chi-squared results") %>% kable_styling()
```

## Conclusions

Since the p-value for the one-way ANOVA on mean cards memory scores (0.128) was greater than the significance level of 0.05, there is no evidence against the mean cards score for all alcohol consumption frequencies being the same. The p-value for the corresponding Kruskal-Wallis rank sum test on cards memory scores (0.141) was also greater than the significance level of 0.05[^1]. So, this data does not provide evidence for a correlation between alcohol consumption frequency and Cards memory test scores.

[^1]: Note that the Kruskal-Wallis rank sum test should not be interpreted as detecting differences between means when the distributions of groups are significantly different, but rather detecting differences between the groups in general (Kruskal & Wallis, 1952).

The p-value for the one-way ANOVA on mean vocabulary memory scores (p = 0.012) was smaller than the significance level of 0.05, but not extremely small. This suggests that the data provides moderate evidence against the mean vocabulary score for all alcohol consumption frequencies being the same. The p-value for the corresponding Krukal-Wallis rank sum test on vocabulary memory scores (p = 0.007) was actually somewhat smaller than the p-value from the one-way ANOVA. As a result, the data may actually provide strong evidence for a significant difference in mean vocabulary memory test score for one or more alcohol consumption frequency groups. This supports the presence of an association between alcohol consumption frequency and vocabulary memory performance. 

The subsequent pairwise t-tests on mean vocabulary memory scores did not find significance between most groups (Table 8). However, one exception to this was between the *Several times a season* and *Once or twice a year* groups (p = 0.0216), which indicates that there was a statistically significant difference between them. This agrees with the one-way ANOVA results that at least one of the means differs from the others.

The Chi-squared test between self-reported forgetfulness and alcohol consumption frequency yielded a non-significant p-value of 0.1888. The corresponding Monte Carlo simulation also yielded a similar non-significant p-value of 0.1814. As a result, there is no evidence that the level of self-reported forgetfulness is dependent on the frequency of alcohol consumption.

## Discussion

The study’s overall goal was to determine if there is a correlation between alcohol consumption and memory performance. Since alcohol is widely consumed, the impact of these results has far reaching effects in fields like neuroscience, physiology, psychology, and public health. These results ultimately contribute to the understanding of the neurological effects of alcohol on cognitive processes. Increased understanding of the negative impacts of alcohol can also help lawmakers make informed decisions about public health policy.

In both the one-way ANOVAs and Kruskal-Wallis tests, the vocabulary memory test scores showed a statistically significant difference (ANOVA p = 0.012) but not the cards scores (ANOVA p = 0.128). This suggests that alcohol consumption may have a more prominent effect on complex linguistic tasks, such as word recall, than simpler and more direct card memorization tasks. There is some limited research supporting this view. One previous study suggested that alcohol has a greater effect on memory for information recalled in sequential order or using covert rehearsal, the act of internally vocalizing words to remember them (Saults et al., 2007). The vocabulary test involves remembering a list of 20 words, so participants are likely to employ covert rehearsal and think in sequential order during recall in this test. The cards memory test simply involves remembering 10 random cards, which may not rely as strongly on these highly impaired memory strategies. Even though the conditions for the Chi-Squared test were not met, if the conditions of the test were fulfilled, it would conclude that there was no association between the level of forgetfulness and the level of alcohol consumption. Moreover, a Monte Carlo simulation was performed between the two groups and it was determined that the two groups did not have a statistically significant association between them. There is some research that indicates that alcohol does not impair the ability to recall existing memories but rather affects the areas of the brain that are responsible for the generation of new long-term memories (White, 2003). The self-reported forgetfulness levels are based on the individuals’ perception of their ability to recall previous memories, while the memory tests depend on their abilities to form new memories.

This observational study faced several significant methodological challenges that must be carefully considered when interpreting the results. This observational study can only conclude correlation and not causation between alcohol consumption levels and memory abilities. Furthermore, confounding variables may have influenced the study results. For example, sleep, exercise, diet, and the period of time since an individual’s last drink directly influence memory abilities, and their distribution may not have been the same in each alcohol consumption group, especially considering the presence of small sample sizes (Pickersgill et al., 2022). The probability sampling approach reduced some confounding, but it can not be completely avoided due to the nature of observational studies. 

The pairwise t-tests on vocabulary memory performance summarized in Table 8 showed strong evidence for a significant difference between the Once or twice a year and Several times each season groups, but not for the other groups. Table 2 shows that the majority of study participants happen to fall into these groups. This suggests that the sample sizes may simply be too small for the study to have sufficient power to detect differences between the other groups, and Type II error may be a significant concern. In a future study, an increased sample size may improve power and reduce the likeliness of Type II errors. Additionally, it may be beneficial to use a different measure of alcohol consumption which is more quantitative and specific. This should allow the participants to be more evenly spread among categories. Response bias might be another source of error since study participants self-reported their alcohol consumption levels and degree of forgetfulness. Previous studies found that participants tend to under-report or over-report when given a survey (Black & Cole, 2001). For example, overweight or obese individuals tend to underreport their daily caloric intake (Dhale et al., 2021). The vocab and cards memory test also don’t fully measure the cognitive capabilities of individuals, due to the limited nature of each test. Additionally, some Arcadia residents refused to participate in this study, which can cause non-response bias. Thus, there is a chance that there is a difference between respondents and non-respondents, highlighting the complexity of achieving a representative sample of a given population. Despite the differences in cards memory test scores not being statistically significant, the graphical summaries seemed to show a negative correlation similar to that of the vocabulary memory test scores. Rather than being due to differences in memory type, it is also possible that insufficient power due to small sample sizes may have contributed to these results. From Table 1, there were only 5 and 9 participants in the two highest alcohol consumption categories. 

Overall, the results from this study appear to support the hypothesis that alcohol consumption has a negative impact on memory and cognitive function. This aligns with other studies that also found a similar negative effect (Topiwala et al., 2017). Moving forward, the next steps would be to expand the study to more geographical and demographical populations that would provide additional evidence by increasing the sample size and improving generalizability. Different tests and analyses could also be used to determine the degree of the effect of alcohol consumption on memory and cognitive abilities. Another potential area of investigation may be the effects of genetics on how alcohol impacts the memory of different individuals. This could be done by identifying biological markers that are associated with increased susceptibility to negative memory and cognitive effects.

\newpage
## References

Adery C. A. Hope. (1968). A Simplified Monte Carlo Significance Test Procedure. *Journal of the Royal Statistical Society. Series B (Methodological)*, 30(3), 582–598. http://www.jstor.org/stable/2984263

Black, A. E., & Cole, T. J. (2001). Biased Over- Or Under-Reporting is Characteristic of Individuals Whether Over Time or by Different Assessment Methods. *Journal of the American Dietetic Association*, 101(1), 70–80. https://doi.org/10.1016/S0002-8223(01)00018-9 

Brennan, S. E., McDonald, S., Page, M. J., Reid, J., Ward, S., Forbes, A. B., & McKenzie, J. E. (2020). Long-term effects of alcohol consumption on cognitive function: a systematic review and dose-response analysis of evidence published between 2007 and 2018. *Systematic Reviews*, 9(1), 33–33. https://doi.org/10.1186/s13643-019-1220-4 

Butterworth, B., Hand, C. J., Lorimer, K., & Gawrylowicz, J. (2023). The impact of post-encoding alcohol consumption on episodic memory recall and remember-know responses in heavy drinkers. *Frontiers in Psychology*, 14, 1007477–1007477. https://doi.org/10.3389/fpsyg.2023.1007477

Dahle, J. H., Ostendorf, D. M., Zaman, A., Pan, Z., Melanson, E. L., & Catenacci, V. A. (2021). Underreporting of energy intake in weight loss maintainers. *The American Journal of Clinical Nutrition*, 114(1), 257–266. https://doi.org/10.1093/ajcn/nqab012 

Gough, T., Christiansen, P., Rose, A. K., & Hardman, C. A. (2021). The effect of acute alcohol consumption on meal memory and subsequent food intake: Two laboratory experiments. *Appetite*, 163, 105225–105225. https://doi.org/10.1016/j.appet.2021.105225

Kruskal, W. H., & Wallis, W. A. (1952). Use of Ranks in One-Criterion Variance Analysis. *Journal of the American Statistical Association*, 47(260), 583–621. https://doi.org/10.2307/2280779

Nutt, D., Hayes, A., Fonville, L., Zafar, R., Palmer, E. O. C., Paterson, L., & Lingford-Hughes, A. (2021). Alcohol and the Brain. *Nutrients*, 13(11), 3938-. https://doi.org/10.3390/nu13113938

Park, S. H., & Kim, D. J. (2020). Global and regional impacts of alcohol use on public health: Emphasis on alcohol policies. *Clinical and Molecular Hepatology*, 26(4), 652–661. https://doi.org/10.3350/cmh.2020.0160

Pickersgill, J. W., Turco, C. V., Ramdeo, K., Rehsi, R. S., Foglia, S. D., & Nelson, A. J. (2022). The Combined Influences of Exercise, Diet and Sleep on Neuroplasticity. *Frontiers in Psychology*, 13, 831819. https://doi.org/10.3389/fpsyg.2022.831819 

Saults, J. S., Cowan, N., Sher, K. J., & Moreno, M. V. (2007). Differential Effects of Alcohol on Working Memory: Distinguishing Multiple Processes. *Experimental and Clinical Psychopharmacology*, 15(6), 576–587. https://doi.org/10.1037/1064-1297.15.6.576 

Schreiber Compo, N., Carol, R. N., Evans, J. R., Pimentel, P., Holness, H., Nichols-Lopez, K., Rose, S., & Furton, K. G. (2017). Witness Memory and Alcohol: The Effects of State-Dependent Recall. *Law and Human Behavior*, 41(2), 202–215. https://doi.org/10.1037/lhb0000224 

Topiwala, A., Allan, C. L., Valkanova, V., Zsoldos, E., Filippini, N., Sexton, C., Mahmood, A., Fooks, P., Singh-Manoux, A., Mackay, C. E., Kivimäki, M., & Ebmeier, K. P. (2017). Moderate alcohol consumption as risk factor for adverse brain outcomes and cognitive decline: longitudinal cohort study. *BMJ (Online)*, 357, j2353–j2353. https://doi.org/10.1136/bmj.j2353 

White A. M. (2003). What happened? Alcohol, memory blackouts, and the brain. *Alcohol research & health: the journal of the National Institute on Alcohol Abuse and Alcoholism*, 27(2), 186–196. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6668891/

\newpage
## Appendix

**Dataset Info**

| Variable Name       | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|--------------------|----------------------------------------------------|
| House               | House number of participant. Corresponds to house numbers in Arcadia on The Islands. Randomly generated using Excel.                                                                                                                                                                                                                                                                                                                                                                  |
| Name                | Name of participant.                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| Alcohol_consumption | Alcohol consumption frequency. As reported by the participant after being asked "How often do you drink alcohol?". Responses fall into one of the following categories: ("Rarely/Once or twice a year", "Once or twice each season", "Several times each season", "Once each day", "Couple of times each day"). Categorical ordinal variable. Participants who did not provide consent, empty houses, or duplicates have "NO CONSENT", "EMPTY", "DUPLICATE", or "NONE" in this entry. |
| Cards               | Cards memory test score. Number of cards recalled in one minute, out of 10 drawn from a 52 card deck. Numerical discrete variable.                                                                                                                                                                                                                                                                                                                                                    |
| Vocab               | Vocabulary memory test score. Number of words recalled in 30 seconds, out of a list of 20 words seen for 1 minute. Numerical discrete variable.                                                                                                                                                                                                                                                                                                                                       |
| Forgetful           | Subjective forgetfulness. As reported by the participant after being asked "How forgetful do you feel right now?". Responses fall into one of the following categories: ("Not at all", "A little", "Moderately"). Categorical ordinal variable.                                                                                                                                                                                                                                       |

[The dataset can be accessed by clicking here (access file -\> download as csv).](https://dataverse.harvard.edu/file.xhtml?fileId=10058517&version=1.0)
